1: Geospatial Analytics for Public Good

Author

Magdalene Chan

Published

November 25, 2023

Modified

December 3, 2023

Objectives

As city-wide urban infrastructures become increasingly digital, datasets from technologies like GPS and RFID on vehicles offer opportunities to track movement patterns over space and time. For instance, smart cards and GPS devices on public buses collect routes and ridership data, containing valuable structure and patterns for understanding human movement and behavior within cities. Despite their potential, the practical use of these extensive location-aware datasets often remains limited to basic tracking and mapping within GIS applications due to the lack of comprehensive spatial and spatio-temporal analysis functions in conventional GIS tools.

Exploratory Spatial Data Analysis (ESDA) holds tremendous potential to address such complex problems. In this study, appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) will be applied to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Tasks

The following tasks will be undertaken in this exercise:

Geovisualisation and Analysis

  1. Compute the passenger trips generated by origin at a hexagon level of 250m for the following time periods:
    1. Weekday morning peak – 6am to 9am (inclusive)
    2. Weekday evening peak – 5pm to 8pm (inclusive)
    3. Weekend/holiday morning peak – 11am to 2pm (inclusive)
    4. Weekend/holiday evening peak – 4pm to 7pm (inclusive)
  2. Display the geographical distribution of the passenger trips using appropriate geovisualisation methods.
  3. Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).

Emerging Hot Spot Analysis

With reference to the passenger trips generated by origin at the hexagon level for the four time periods given above:

  1. Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
  2. Prepare EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05).
  3. With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns revealed (not more than 250 words per cluster).

Getting Started

The code chunk below uses p_load() of pacman package to check if the required packages have been installed on the computer. If they are, the packages will be launched. The following packages will be used:

  • sf package is used for importing, managing, and processing geospatial data.
  • sfdep package is used to create spatial weights matrix and LISA objects using the sf class to represent spatial data.
  • tmap package is used for thematic mapping.
  • plotly package is used to create interactive graphs and charts.
  • tidyverse package is used for aspatial data wrangling.
  • knitr package is used for dynamic report generation in R.
Code
pacman::p_load(sf, sfdep, tmap, plotly, tidyverse, knitr)

Import Data

The data sets used are:

  • Bus Stop Location (Last updated Jul 2023) from LTADataMall retrieved on 18 Nov 2023
  • Passenger Volume by Origin Destination Bus Stops for August 2023 from LTADataMall retrieved on 18 Nov 2023

Import Geospatial Data: Bus Stop Locations

The code chunk below uses the st_read() function of sf package to import BusStop shapefile into R as a simple feature data frame called BusStop. As BusStop uses svy21 projected coordinate system, the crs is set to 3414.

Code
BusStop <- st_read(dsn = "data/geospatial", 
                layer = "BusStop") %>%
  st_transform(crs=3414)
Reading layer `BusStop' from data source 
  `C:\magdalenecjw\ISSS624 Geospatial\Take_Home_Exercise\Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

There are a total of 5161 features.

The structure of the data frame is examined using the code chunk below.

Code
str(BusStop)
Classes 'sf' and 'data.frame':  5161 obs. of  4 variables:
 $ BUS_STOP_N: chr  "22069" "32071" "44331" "96081" ...
 $ BUS_ROOF_N: chr  "B06" "B23" "B01" "B05" ...
 $ LOC_DESC  : chr  "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
 $ geometry  :sfc_POINT of length 5161; first list element:  'XY' num  13576 32884
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
  ..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"

Based on the data frame structure seen above, BUS_STOP_N is listed as a character variable. As this variable will be used as the identifier to link to the aspatial data, it should be transformed to a factor so that R treats it as a grouping variable.

Code
# Apply as.factor() to the column
BusStop$BUS_STOP_N <- as.factor(BusStop$BUS_STOP_N)

# Re-examine the structure of the data frame
str(BusStop)
Classes 'sf' and 'data.frame':  5161 obs. of  4 variables:
 $ BUS_STOP_N: Factor w/ 5145 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
 $ BUS_ROOF_N: chr  "B06" "B23" "B01" "B05" ...
 $ LOC_DESC  : chr  "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
 $ geometry  :sfc_POINT of length 5161; first list element:  'XY' num  13576 32884
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
  ..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"

Based on the output above, BUS_STOP_N is now a factor.

It is good practice to check for duplicating records before continuing further with the data wrangling.

Code
duplicate <- BusStop[duplicated(st_geometry(BusStop)) | duplicated(st_geometry(BusStop), 
                                                                   fromLast = TRUE), ]

duplicate
Simple feature collection with 2 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 42187.23 ymin: 34995.78 xmax: 42187.23 ymax: 34995.78
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N        LOC_DESC                  geometry
3264      96319       <NA> Yusen Logistics POINT (42187.23 34995.78)
3265      96319        NIL YUSEN LOGISTICS POINT (42187.23 34995.78)

As there are duplicated records found, the code chunk below will be used to retain the unique records, taking the first occurrence of each duplicated record.

Code
BusStop <- BusStop[!duplicated(st_geometry(BusStop)) | duplicated(st_geometry(BusStop), 
                                                                  fromLast = TRUE), ] %>%
  mutate(BUS_ROOF_N = ifelse(is.na(BUS_ROOF_N), "NIL", BUS_ROOF_N))

# Re-examine the structure of the data frame
str(BusStop)
Classes 'sf' and 'data.frame':  5160 obs. of  4 variables:
 $ BUS_STOP_N: Factor w/ 5145 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
 $ BUS_ROOF_N: chr  "B06" "B23" "B01" "B05" ...
 $ LOC_DESC  : chr  "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
 $ geometry  :sfc_POINT of length 5160; first list element:  'XY' num  13576 32884
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
  ..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"

There are now a total of 5160 observations, after removing one duplicated record. However, further checks in the code chunk below shows that there should only be 5145 unique bus stop codes.

Code
n_distinct(BusStop$BUS_STOP_N)
[1] 5145

The code chunk below shows a list of duplicated bus stop codes. Based on the output below, there are 15 bus stops with duplicated records.

Code
duplicate <- BusStop %>%
  group_by(BUS_STOP_N) %>%
  filter(n() > 1) %>%
  arrange(BUS_STOP_N)

duplicate
Simple feature collection with 30 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 13488.02 ymin: 32594.17 xmax: 44144.57 ymax: 47934
Projected CRS: SVY21 / Singapore TM
# A tibble: 30 × 4
# Groups:   BUS_STOP_N [15]
   BUS_STOP_N BUS_ROOF_N LOC_DESC                        geometry
   <fct>      <chr>      <chr>                        <POINT [m]>
 1 11009      B04        Ghim Moh Ter         (23101.34 32594.17)
 2 11009      B04-TMNL   GHIM MOH TER         (23100.58 32604.36)
 3 22501      B02        Blk 662A              (13489.09 35536.4)
 4 22501      B02        BLK 662A             (13488.02 35537.88)
 5 43709      B06        BLK 644               (18963.42 36762.8)
 6 43709      B06        BLK 644              (18952.02 36751.83)
 7 47201      UNK        <NA>                 (22616.75 47793.68)
 8 47201      NIL        W'LANDS NTH STN         (22632.92 47934)
 9 51071      B21        MACRITCHIE RESERVOIR (28311.27 36036.92)
10 51071      B21        MACRITCHIE RESERVOIR (28282.54 36033.93)
# … with 20 more rows

The code chunk below uses the distinct() function to keep only the unique rows based on the BUS_STOP_N while preserving all other fields (.keep_all = TRUE). By default, distinct() keeps the first occurrence of each unique combination of values in the specified columns.

Code
BusStop <- BusStop %>%
  distinct(BUS_STOP_N, .keep_all = TRUE)

# Re-examine the structure of the data frame
str(BusStop)
Classes 'sf' and 'data.frame':  5145 obs. of  4 variables:
 $ BUS_STOP_N: Factor w/ 5145 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
 $ BUS_ROOF_N: chr  "B06" "B23" "B01" "B05" ...
 $ LOC_DESC  : chr  "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
 $ geometry  :sfc_POINT of length 5145; first list element:  'XY' num  13576 32884
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA
  ..- attr(*, "names")= chr [1:3] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC"

There are now a total of 5145 observations, after removing the 15 duplicated bus stop codes.

Import Passenger Volume by Origin-Destination Bus Stops

The code chunk below uses the read_csv() function of readr package (imported with the tidyverse package) to import the csv files into R.

Code
odbus <- read_csv("data/aspatial/origin_destination_bus_202308.csv")

# Examine the structure of the data frame
str(odbus)
spc_tbl_ [5,709,512 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ YEAR_MONTH         : chr [1:5709512] "2023-08" "2023-08" "2023-08" "2023-08" ...
 $ DAY_TYPE           : chr [1:5709512] "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
 $ TIME_PER_HOUR      : num [1:5709512] 16 16 14 14 17 17 17 17 7 17 ...
 $ PT_TYPE            : chr [1:5709512] "BUS" "BUS" "BUS" "BUS" ...
 $ ORIGIN_PT_CODE     : chr [1:5709512] "04168" "04168" "80119" "80119" ...
 $ DESTINATION_PT_CODE: chr [1:5709512] "10051" "10051" "90079" "90079" ...
 $ TOTAL_TRIPS        : num [1:5709512] 7 2 3 10 5 4 3 22 3 3 ...
 - attr(*, "spec")=
  .. cols(
  ..   YEAR_MONTH = col_character(),
  ..   DAY_TYPE = col_character(),
  ..   TIME_PER_HOUR = col_double(),
  ..   PT_TYPE = col_character(),
  ..   ORIGIN_PT_CODE = col_character(),
  ..   DESTINATION_PT_CODE = col_character(),
  ..   TOTAL_TRIPS = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Based on the data frame structure seen above, ORIGIN_PT_CODE and DESTINATION_PT_CODE are listed as integer variables, though they are categorical in nature. As such, they should be transformed to factors so that R treats them as a grouping variable.

Code
# Columns to convert to factors
columns_to_convert <- c("ORIGIN_PT_CODE", "DESTINATION_PT_CODE")

# Apply as.factor() to the adjusted columns
odbus[columns_to_convert] <- lapply(odbus[columns_to_convert], as.factor)

# Re-examine the structure of the data frame
str(odbus)
spc_tbl_ [5,709,512 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ YEAR_MONTH         : chr [1:5709512] "2023-08" "2023-08" "2023-08" "2023-08" ...
 $ DAY_TYPE           : chr [1:5709512] "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
 $ TIME_PER_HOUR      : num [1:5709512] 16 16 14 14 17 17 17 17 7 17 ...
 $ PT_TYPE            : chr [1:5709512] "BUS" "BUS" "BUS" "BUS" ...
 $ ORIGIN_PT_CODE     : Factor w/ 5067 levels "01012","01013",..: 104 104 4422 4422 2008 2008 832 832 779 355 ...
 $ DESTINATION_PT_CODE: Factor w/ 5071 levels "01012","01013",..: 239 239 4736 4736 691 691 807 807 234 107 ...
 $ TOTAL_TRIPS        : num [1:5709512] 7 2 3 10 5 4 3 22 3 3 ...
 - attr(*, "spec")=
  .. cols(
  ..   YEAR_MONTH = col_character(),
  ..   DAY_TYPE = col_character(),
  ..   TIME_PER_HOUR = col_double(),
  ..   PT_TYPE = col_character(),
  ..   ORIGIN_PT_CODE = col_character(),
  ..   DESTINATION_PT_CODE = col_character(),
  ..   TOTAL_TRIPS = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Based on the output above, ORIGIN_PT_CODE and DESTINATION_PT_CODE are now factors.

Extract Commuting Flow data

The code chunk below extracts commuting flows for the 4 target time periods. After computing the commuting flow for each Bus Stops x Day Time x Time combination, missing combinations (when compared against the full list of combinations) will be imputed with a value of 0.

Code
# Create a list of possible bus stop code, day type and time per hour for checking
BusStopList <- expand.grid(ORIGIN_PT_CODE = unique(BusStop$BUS_STOP_N),
                          DAY_TYPE = c("WEEKDAY", "WEEKENDS/HOLIDAY"),
                          TIME_PER_HOUR = unique(odbus$TIME_PER_HOUR))

# Commute commuting flow by bus stop code, day type and time per hour
odbus_recoded <- odbus %>%
  group_by(ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS)) 

# Identify missing Bus Stops and impute a 0 value
odbus_missing <- BusStopList %>%
  anti_join(odbus_recoded, by = c("ORIGIN_PT_CODE", "DAY_TYPE", "TIME_PER_HOUR")) %>%
  mutate(TRIPS = 0)

# Combine the above two data frames
odbus_long <- rbind(odbus_recoded, odbus_missing) %>%
    mutate(peak_period = case_when(
    DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 6 & TIME_PER_HOUR <=9 ~ 
      "weekday_morn",
    DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR >= 17 & TIME_PER_HOUR <=20 ~ 
      "weekday_evening",
    DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 11 & TIME_PER_HOUR <=14 ~ 
      "weekend_morn",
    DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR >= 16 & TIME_PER_HOUR <=19 ~ 
      "weekend_evening",
    TRUE ~ "non_peak")) 

# Pivot wider to form data frames with one bus stop code per row
odbus_wide_peak <- odbus_long %>%
  group_by(ORIGIN_PT_CODE, peak_period) %>%
  summarise(TOTAL_TRIPS = sum(TRIPS)) %>%
  pivot_wider(names_from = peak_period,
              values_from = TOTAL_TRIPS)

odbus_wide <- odbus_long %>%
  filter(!(peak_period == "non_peak")) %>%
  group_by(ORIGIN_PT_CODE, DAY_TYPE, TIME_PER_HOUR) %>%
  summarise(TOTAL_TRIPS = sum(TRIPS)) %>%
  mutate(Day_Time = paste(DAY_TYPE, TIME_PER_HOUR)) %>%
  ungroup %>%
  select(-c(DAY_TYPE, TIME_PER_HOUR)) %>%
  pivot_wider(names_from = Day_Time,
              values_from = TOTAL_TRIPS) %>%
  left_join(odbus_wide_peak)

Append Commuting Flow data with Bus Stop Locations

left_join() of dplyr is then used to join the geographical data and commuting flow data table using the Bus Stop Code as the common identifier. Left join is done to ensure that the geospatial properties (geometry column) of the BusStop sf data frame is retained and bus stops not found within the geospatial data set are dropped.

Code
odBusStop <- left_join(BusStop, odbus_wide, by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))

str(odBusStop)
Classes 'sf' and 'data.frame':  5145 obs. of  25 variables:
 $ BUS_STOP_N         : Factor w/ 5199 levels "01012","01013",..: 1008 1724 2118 4972 431 3787 1160 2941 1610 4983 ...
 $ BUS_ROOF_N         : chr  "B06" "B23" "B01" "B05" ...
 $ LOC_DESC           : chr  "OPP CEVA LOGISTICS" "AFT TRACK 13" "BLK 239" "GRACE INDEPENDENT CH" ...
 $ WEEKDAY 6          : num  0 0 497 67 46 ...
 $ WEEKDAY 7          : num  9 0 623 128 36 ...
 $ WEEKDAY 8          : num  1 0 631 62 60 ...
 $ WEEKDAY 9          : num  5 0 373 50 48 ...
 $ WEEKDAY 17         : num  19 0 654 254 47 ...
 $ WEEKDAY 18         : num  17 0 485 104 60 ...
 $ WEEKDAY 19         : num  6 0 382 33 58 70 149 284 841 221 ...
 $ WEEKDAY 20         : num  1 0 221 8 30 42 18 140 610 91 ...
 $ WEEKENDS/HOLIDAY 11: num  0 0 178 27 12 57 3 415 502 234 ...
 $ WEEKENDS/HOLIDAY 12: num  2 0 201 39 32 44 3 447 551 246 ...
 $ WEEKENDS/HOLIDAY 13: num  1 0 183 17 24 27 4 320 560 206 ...
 $ WEEKENDS/HOLIDAY 14: num  3 0 138 27 13 30 0 254 357 172 ...
 $ WEEKENDS/HOLIDAY 16: num  6 1 130 21 21 34 17 131 465 232 ...
 $ WEEKENDS/HOLIDAY 17: num  4 0 151 23 10 41 37 133 541 171 ...
 $ WEEKENDS/HOLIDAY 18: num  1 0 135 29 11 57 15 142 499 167 ...
 $ WEEKENDS/HOLIDAY 19: num  1 0 130 11 19 41 15 91 368 140 ...
 $ non_peak           : num  163 6 3813 935 412 ...
 $ weekday_evening    : num  43 0 1742 399 195 ...
 $ weekday_morn       : num  15 0 2124 307 190 ...
 $ weekend_evening    : num  12 1 546 84 61 ...
 $ weekend_morn       : num  6 0 700 110 81 ...
 $ geometry           :sfc_POINT of length 5145; first list element:  'XY' num  13576 32884
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:24] "BUS_STOP_N" "BUS_ROOF_N" "LOC_DESC" "WEEKDAY 6" ...

There are a total of 5145 observations, which matches the number of features in the BusStop sf data frame.

Geovisualisation and Analysis

Create Spatial Grids

Spatial grids are commonly used in spatial analysis to divide the study area into equal-sized, regular polygons that can tessellate the whole study area. After that, a variable is summarized within each polygon. For this exercise, a hexagon layer of 250m (where 250m is the perpendicular distance between the hexagon centre and its edges) will be created.

In the code chunk below, st_make_grid is used to create the hexagon grids using the svy21 projected coordinate system. The data frame is then converted into a sf data frame and an index is added for each hexagon.

In st_make_grid, cellsize argument refers to the cell size in the units that the crs of the spatial data is using. The cellsize can be defined as the distance between opposite edges. Since the data is set in SVY21 projected coordinate system, which uses metres as the unit, the value is set as c(500,500) to create a hexagon layer of 250m.

Code
hex_grid <- st_make_grid(odBusStop, cellsize = c(500, 500), 
                         crs = 3413, what = "polygons", square = FALSE) %>%
  st_sf() %>%
  mutate(index = row_number())

In the code chunk below, st_join and st_intersects() is used to return a list of bus stops lying within each hexagon grid. A left join is done in this process (using the argument left = TRUE) to ensure that the geospatial properties (geometry column) of the hex_grid sf data frame is retained. After filtering out hexagon grids that do not contain any bus stops, the number of bus stops and the total number of trips at each time period is then computed for each hexagon grid.

Code
hex_grid_count <- st_join(hex_grid, odBusStop, join = st_intersects, left = TRUE) %>%
  filter(!(is.na(BUS_STOP_N))) %>%
  group_by(index) %>%
  summarise(
    count = n(),
    bus_stop_codes = paste(BUS_STOP_N, collapse = ", "),
    bus_stop_names = paste(LOC_DESC, collapse = ", "),
    across(starts_with("week"), sum)
  ) %>%
  ungroup()

The hex_grid_count sf data frame can now be used for plotting the geographical distribution of peak hour bus stops and commuting flow using tmap.

Visualisation of Geographical Distribution of Commuting Flow

For efficiency of plotting, select just the columns that will be used for this section.

Code
hex_grid_count2 <- hex_grid_count %>%
  select(index, count, bus_stop_codes, bus_stop_names, geometry,
         weekday_evening, weekday_morn, weekend_evening, weekend_morn)

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other classes. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes.

n=5 is used for this exercise.

Code
tmap_mode("view")

tm_shape(hex_grid_count2) +
  tm_fill(
    col = "weekday_morn",
    palette = "Blues",
    n = 5,
    style = "jenks",
    popup.vars = c("Number of Trips: " = "weekday_morn",
                   "Bus Stops:" = "bus_stop_names")
  )

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other classes. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes.

n=5 is used for this exercise.

Code
tm_shape(hex_grid_count2) +
  tm_fill(
    col = "weekday_evening",
    palette = "Oranges",
    n = 5,
    style = "jenks",
    popup.vars = c("Number of Trips: " = "weekday_evening",
                   "Bus Stops:" = "bus_stop_names")
  )

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other classes. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes.

n=5 is used for this exercise.

Code
tm_shape(hex_grid_count2) +
  tm_fill(
    col = "weekend_morn",
    palette = "Greens",
    n = 5,
    style = "jenks",
    popup.vars = c("Number of Trips: " = "weekend_morn",
                   "Bus Stops:" = "bus_stop_names")
  )

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other classes. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes.

n=5 is used for this exercise.

Code
tm_shape(hex_grid_count2) +
  tm_fill(
    col = "weekend_evening",
    palette = "Reds",
    n = 5,
    style = "jenks",
    popup.vars = c("Number of Trips: " = "weekend_evening",
                   "Bus Stops:" = "bus_stop_names")
  )

Emerging Hotspot Analysis

Time Series Cube

Spatio-temporal data are typically presented in long formats, where a row identifies a unique location and time observation represented by a column dedicated to time and another to locations. However, such data are traditionally not linked to the geographies as they contain only an identifier of the location, but not the spatial representation that makes the data meaningful for geospatial analysis.

The spacetime class in sfdep package can be used to create lattice data that is a representation of spatio-temporal data for a set of regions containing the geometries over a number of different time-periods. There are two possible layouts: spatio-temporal full grid and sparse grids.

Given a number of spatial features n, and time periods m, a spatio-temporal full grid contains n×m rows. Each location has a recorded observation for each of the time periods in m. When there are missing observations for some locations or time periods and they are entirely omitted from the data set, that is a spatio-temporal sparse grid.

For this analysis, time series cubes will be created for each of the four target time periods respectively using the earlier created hex_grid_count.

Create Time Series Cube

In the code chunks below, relevant columns that correspond to the target time periods will be selected from hex_grid_count and joined with hex_grid to create the n number of spatial features. The corresponding sf data frame will then be pivoted longer to create a time series data containing the m time periods. Once the data frame has nxm rows, apply the spacetime() function to create the time series cube.

Code
# Select relevant columns from hex_grid_count
weekday_morn_hex <- hex_grid_count %>%
  select(index, count, bus_stop_codes, bus_stop_names, geometry,
         `WEEKDAY 6`, `WEEKDAY 7`, `WEEKDAY 8`, `WEEKDAY 9`)

# Join with hex_grid and pivot longer to create time series data
weekday_morn_full <- st_join(hex_grid, weekday_morn_hex, join = st_within) %>%
  select(-c(index.y)) %>%
  rename(index = index.x) %>%
  rename_with(~gsub("WEEKDAY ", "", .), matches("^WEEKDAY \\d+$")) %>%
  pivot_longer(cols = c(`6`, `7`, `8`, `9`), 
               names_to = "TIME_PER_HOUR", values_to = "TRIPS",
               names_transform = as.integer, values_transform = as.integer) %>%
  mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .)))

# Apply spacetime() function
weekday_morn_cube <- spacetime(weekday_morn_full, hex_grid,
                               .loc_col = "index", .time_col = "TIME_PER_HOUR")

# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekday_morn_cube)
[1] TRUE

The TRUE return confirms that weekday_morn_cube object is indeed an time-space cube.

Code
# Select relevant columns from hex_grid_count
weekday_evening_hex <- hex_grid_count %>%
  select(index, count, bus_stop_codes, bus_stop_names, geometry,
         `WEEKDAY 17`, `WEEKDAY 18`, `WEEKDAY 19`, `WEEKDAY 20`)

# Join with hex_grid and pivot longer to create time series data
weekday_evening_full <- st_join(hex_grid, weekday_evening_hex, join = st_within) %>%
  select(-c(index.y)) %>%
  rename(index = index.x) %>%
  rename_with(~gsub("WEEKDAY ", "", .), matches("^WEEKDAY \\d+$")) %>%
  pivot_longer(cols = c(`17`, `18`, `19`, `20`), 
               names_to = "TIME_PER_HOUR", values_to = "TRIPS",
               names_transform = as.integer, values_transform = as.integer) %>%
  mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .)))

# Apply spacetime() function
weekday_evening_cube <- spacetime(weekday_evening_full, hex_grid,
                               .loc_col = "index", .time_col = "TIME_PER_HOUR")

# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekday_evening_cube)
[1] TRUE

The TRUE return confirms that weekday_evening_cube object is indeed an time-space cube.

Code
# Select relevant columns from hex_grid_count
weekend_morn_hex <- hex_grid_count %>%
  select(index, count, bus_stop_codes, bus_stop_names, geometry,
         `WEEKENDS/HOLIDAY 11`, `WEEKENDS/HOLIDAY 12`, 
         `WEEKENDS/HOLIDAY 13`, `WEEKENDS/HOLIDAY 14`)

# Join with hex_grid and pivot longer to create time series data
weekend_morn_full <- st_join(hex_grid, weekend_morn_hex, join = st_within) %>%
  select(-c(index.y)) %>%
  rename(index = index.x) %>%
  rename_with(~gsub("WEEKENDS/HOLIDAY ", "", .), matches("^WEEKENDS/HOLIDAY \\d+$")) %>%
  pivot_longer(cols = c(`11`, `12`, `13`, `14`), 
               names_to = "TIME_PER_HOUR", values_to = "TRIPS",
               names_transform = as.integer, values_transform = as.integer) %>%
  mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .)))

# Apply spacetime() function
weekend_morn_cube <- spacetime(weekend_morn_full, hex_grid,
                               .loc_col = "index", .time_col = "TIME_PER_HOUR")

# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekend_morn_cube)
[1] TRUE

The TRUE return confirms that weekend_morn_cube object is indeed an time-space cube.

Code
# Select relevant columns from hex_grid_count
weekend_evening_hex <- hex_grid_count %>%
  select(index, count, bus_stop_codes, bus_stop_names, geometry,
         `WEEKENDS/HOLIDAY 16`, `WEEKENDS/HOLIDAY 17`, 
         `WEEKENDS/HOLIDAY 18`, `WEEKENDS/HOLIDAY 19`)

# Join with hex_grid and pivot longer to create time series data
weekend_evening_full <- st_join(hex_grid, weekend_evening_hex, join = st_within) %>%
  select(-c(index.y)) %>%
  rename(index = index.x) %>%
  rename_with(~gsub("WEEKENDS/HOLIDAY ", "", .), matches("^WEEKENDS/HOLIDAY \\d+$")) %>%
  pivot_longer(cols = c(`16`, `17`, `18`, `19`), 
               names_to = "TIME_PER_HOUR", values_to = "TRIPS",
               names_transform = as.integer, values_transform = as.integer) %>%
  mutate(across(where(is.integer), ~ifelse(is.na(.), 0, .)))

# Apply spacetime() function
weekend_evening_cube <- spacetime(weekend_evening_full, hex_grid,
                               .loc_col = "index", .time_col = "TIME_PER_HOUR")

# Verify that resultant output is a spacetime cube
is_spacetime_cube(weekend_evening_cube)
[1] TRUE

The TRUE return confirms that weekend_evening_cube object is indeed an time-space cube.

Computation of Local Gi* statistics

To compute the local Gi* statistics, spatial weights must first be derived. The code chunks below will be used to identify neighbors and to derive an inverse distance weights for each target time period. activate() is used to first activate the geometry context, followed by mutate() to create two new columns nb and wt. Then activate the data context again and copy over the nb and wt columns to each time-slice using set_nbs() and set_wts().

Code
weekday_morn_nb <- weekday_morn_cube %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")
Code
weekday_evening_nb <- weekday_evening_cube %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")
Code
weekend_morn_nb <- weekend_morn_cube %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")
Code
weekend_evening_nb <- weekend_evening_cube %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")

In the code chunks below, the new columns are used to manually calculate the local Gi* for each location. This is done by grouping by TIME_PER_HOUR and using local_gstar_perm() of sfdep package. After which, unnest() is used to unnest the gi_star column of the newly created output.

Code
gi_weekday_morn <- weekday_morn_nb %>%
  group_by(TIME_PER_HOUR) %>%
  mutate(gi_star = local_gstar_perm(TRIPS, nb, wt)) %>%
  tidyr::unnest(gi_star)
Code
gi_weekday_evening <- weekday_evening_nb %>%
  group_by(TIME_PER_HOUR) %>%
  mutate(gi_star = local_gstar_perm(TRIPS, nb, wt)) %>%
  tidyr::unnest(gi_star)
Code
gi_weekend_morn <- weekend_morn_nb %>%
  group_by(TIME_PER_HOUR) %>%
  mutate(gi_star = local_gstar_perm(TRIPS, nb, wt)) %>%
  tidyr::unnest(gi_star)
Code
gi_weekend_evening <- weekend_evening_nb %>%
  group_by(TIME_PER_HOUR) %>%
  mutate(gi_star = local_gstar_perm(TRIPS, nb, wt)) %>%
  tidyr::unnest(gi_star)

Mann-Kendall Test

With the computed Gi* statistics, each location can be evaluated for trends using the Mann-Kendall Test.

Code
mk_weekday_morn <- gi_weekday_morn %>% 
  ungroup() %>% 
  filter(index == 1251) |> 
  select(index, TIME_PER_HOUR, gi_star)

# Plot the result using ggplot2 functions
ggplot(data = mk_weekday_morn, 
       aes(x = TIME_PER_HOUR, 
           y = gi_star)) +
  geom_line() +
  theme_light()